Combining Entropy Based Heuristics with Minimax Search and Temporal Differences to Play Hidden State Games
نویسندگان
چکیده
In this paper, we develop a method for playing variants of spatial games like chess or checkers, where the state of the opponent is only partially observable. Each side has a number of hidden pieces invisible to opposition. An estimate of the opponent state probability distribution is made assuming moves are made to maximize the entropy of subsequent state distribution or belief. The belief state of the game at any time is specified by a probability distribution over opponent’s states and conditional on one of these states, a distribution over our states, this being the estimate of our opponent’s belief of our state. With this, we can calculate the relative uncertainty or entropy balance. We use this information balance along with other observable features and belief-based min-max search to approximate the partially observable Q-function. Gradient decent is used to learn advisor weights.
منابع مشابه
TDLeaf( ): Combining Temporal Difference Learning with Game-Tree Search
ABSTRACT In this paper we present TDLeaf( ), a variation on the TD( ) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD( ) and another less radical variant, TDdirected( ). In particular, our chess program, “KnightCap,” used TDLeaf( ) to learn its evaluati...
متن کاملComparing Minimax and Product in a Variety
This paper describes comparisons of the minimax backup rule and the product back-up rule on a wide variety of games, including P-games, G-games, three-hole kalah, Othello, and Ballard’s incremental game. In three-hole kalah, the product rule plays better than a minimax search to the same depth. This is a remarkable result, since it is the first widely known game in which product has been found ...
متن کاملSimulation Control in General Game Playing Agents
The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games at an expert level without any human intervention. One of the main challenges such agents face is to automatically learn knowledge-based heuristics in realtime, whether for evaluating game positions or for search guidance. In recent years, GGP agents that use Monte...
متن کاملA minimax search algorithm for CDHMM based robust continuous speech recognition
In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov model based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. ...
متن کاملA Simulation-Based General Game Player
The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games at an expert level without any human intervention. The traditional design model for GGP agents has been to use a minimax-based game-tree search augmented with an automatically learned heuristic evaluation function. The first successful GGP agents all followed that ...
متن کامل